12 research outputs found

    A new iterative algorithm for computing a quality approximate median of strings based on edit operations

    Get PDF
    This paper presents a new algorithm that can be used to compute an approximation to the median of a set of strings. The approximate median is obtained through the successive improvements of a partial solution. The edit distance from the partial solution to all the strings in the set is computed in each iteration, thus accounting for the frequency of each of the edit operations in all the positions of the approximate median. A goodness index for edit operations is later computed by multiplying their frequency by the cost. Each operation is tested, starting from that with the highest index, in order to verify whether applying it to the partial solution leads to an improvement. If successful, a new iteration begins from the new approximate median. The algorithm finishes when all the operations have been examined without a better solution being found. Comparative experiments involving Freeman chain codes encoding 2D shapes and the Copenhagen chromosome database show that the quality of the approximate median string is similar to benchmark approaches but achieves a much faster convergence.This work is partially supported by the Spanish CICYT under project DPI2006-15542-C04-01, the Spanish MICINN through project TIN2009-14205-CO4-01 and by the Spanish research program Consolider Ingenio 2010: MIPRCV (CSD2007-00018)

    Analyzing Sentiment, Attraction Type, and Country in Spanish Language TripAdvisor Reviews Using Language Models

    Get PDF
    This paper describes our participation in the Rest-Mex 2023 Sentiment Analysis Task. We proposed an ensemble of (i) a cascade of transformer-based two-class classifiers biased to lowering the Mean Average Error in Polarity, and (ii) multi-class transformer-based classifiers for the prediction of the Type and Location of the messages. Our system achieved a sentiment track score of 0.719.This research has been funded by: the Generalitat Valenciana (Conselleria d’Educació, Investigació, Cultura i Esport), through the project NL4DISMIS: Natural Language Technologies for Dealing with dis- and misinformation (CIPROM/2021/021); MCIN/AEI/10.13039/501100011033 and by the European Union NextGenerationEU/ PRTR through the project ClearText

    Boosting Perturbation-Based Iterative Algorithms to Compute the Median String

    Get PDF
    [Abstract] The most competitive heuristics for calculating the median string are those that use perturbation-based iterative algorithms. Given the complexity of this problem, which under many formulations is NP-hard, the computational cost involved in the exact solution is not affordable. In this work, the heuristic algorithms that solve this problem are addressed, emphasizing its initialization and the policy to order possible editing operations. Both factors have a significant weight in the solution of this problem. Initial string selection influences the algorithm’s speed of convergence, as does the criterion chosen to select the modification to be made in each iteration of the algorithm. To obtain the initial string, we use the median of a subset of the original dataset; to obtain this subset, we employ the Half Space Proximal (HSP) test to the median of the dataset. This test provides sufficient diversity within the members of the subset while at the same time fulfilling the centrality criterion. Similarly, we provide an analysis of the stop condition of the algorithm, improving its performance without substantially damaging the quality of the solution. To analyze the results of our experiments, we computed the execution time of each proposed modification of the algorithms, the number of computed editing distances, and the quality of the solution obtained. With these experiments, we empirically validated our proposal.This work was supported in part by the Comisión Nacional de Investigación Científica y Tecnológica - Programa de Formación de Capital Humano Avanzado (CONICYT-PCHA)/Doctorado Nacional/2014-63140074 through the Ph.D. Scholarship, in part by the European Union's Horizon 2020 under the Marie Sklodowska-Curie under Grant 690941, in part by the Millennium Institute for Foundational Research on Data (IMFD), and in part by the FONDECYT-CONICYT under Grant 1170497. The work of ÓSCAR PEDREIRA was supported in part by the Xunta de Galicia/FEDER-UE refs under Grant CSI ED431G/01 and Grant GRC: ED431C 2017/58, in part by the Office of the Vice President for Research and Postgraduate Studies of the Universidad Católica de Temuco, VIPUCT Project 2020EM-PS-08, and in part by the FEQUIP 2019-INRN-03 of the Universidad Católica de TemucoXunta de Galicia; ED431G/01Xunta de Galicia; ED431C 2017/58Chile. Comisión Nacional de Investigación Científica y Tecnológica; 2014-63140074Chile. Comisión Nacional de Investigación Científica y Tecnológica; 1170497Universidad Católica de Temuco (Chile); 2020EM-PS-08Universidad Católica de Temuco (Chile); 2019-INRN-0

    A Review of Research-Based Automatic Text Simplification Tools

    Get PDF
    In the age of knowledge, the democratisation of information facilitated through the Internet may not be as pervasive if written language poses challenges to particular sectors of the population. The objective of this paper is to present an overview of research-based automatic text simplification tools. Consequently, we describe aspects such as the language, language phenomena, language levels simplified, approaches, specific target populations these tools are created for (e.g. individuals with cognitive impairment, attention deficit, elderly people, children, language learners), and accessibility and availability considerations. The review of existing studies covering automatic text simplification tools is undergone by searching two databases: Web of Science and Scopus. The eligibility criteria involve text simplification tools with a scientific background in order to ascertain how they operate. This methodology yielded 27 text simplification tools that are further analysed. Some of the main conclusions reached with this review are the lack of resources accessible to the public, the need for customisation to foster the individual’s independence by allowing the user to select what s/he finds challenging to understand while not limiting the user’s capabilities and the need for more simplification tools in languages other than English, to mention a few.This research was conducted as part of the Clear-Text project (TED2021-130707B-I00), funded by MCIN/AEI/10.13039/501100011033 and European Union NextGenerationEU/PRTR

    T2KG: Transforming Multimodal Document to Knowledge Graph

    Get PDF
    The large amount of information in digital format that exists today makes it unfeasible to use manual means to acquire the knowledge contained in these documents. Therefore, it is necessary to develop tools that allow us to incorporate this knowledge into a structure that is easy to use by both machines and humans. This paper presents a system that can incorporate the relevant information from a document in any format, structured or unstructured, into a semantic network that represents the existing knowledge in the document. The system independently processes from structured documents based on its annotation scheme to unstructured documents, written in natural language, for which it uses a set of sensors that identifies the relevant information and subsequently incorporates it to enrich the semantic network that is created by linking all the information based on the knowledge discovered.This work has been partially supported by the Valencian Agency for Innovation through the project INNEST/2022/24, ”T2Know: Platform for advanced analysis of scientific-technical texts to extract trends and knowledge through NLP techniques”, partially funded by the Generalitat Valenciana (Conselleria d’Educació, Investigació, Cultura i Esport) through the following projects NL4DISMIS: TLHs for an Equal and Accessible Inclusive Society (CIPROM/2021/021) and partially supported by the Project MODERATES (TED2021-130145B-I00) for Spanish Government

    Multidimensional Data Analysis for Enhancing In-Depth Knowledge on the Characteristics of Science and Technology Parks

    Get PDF
    The role played by science and technology parks (STPs) in technology transfer, industrial innovation, and economic growth is examined in this paper. The accurate monitoring of their evolution and impact is hindered by the lack of uniformity in STP models or goals, and the scarcity of high-quality datasets. This work uses existing terminologies, definitions, and core features of STPs to conduct a multidimensional data analysis that explores and evaluates the 21 core features which describe the key internal factors of an STP. The core features are gathered from a reliable and updatable dataset of Spanish STPs. The methodological framework can be replicated for other STP contexts and is based on descriptive techniques and machine-learning tools. The results of the study provide an overview of the general situation of STPs in Spain, validate the existence and characteristics of three types of STPs, and identify the typical features of STPs. Moreover, the prototype STP can be used as a benchmark so that other STPs can identify the features that need to be improved. Finally, this work makes it possible to carry out classifications of STPs, in addition to prediction and decision making for innovation ecosystems.This research work has been partially funded by the Generalitat Valenciana through the project NL4DISMIS: Natural Language Technologies for dealing with dis- and misinformation with grant reference (CIPROM/2021/21); the Ministry of Science and Innovation, PID2021-123956OB-I00, CORTEX; PID2021-122263OB-C22 COOLANG; and the R&D project CLEARTEXT TED2021-130707B-I00

    GPLSI AitanaWEB. Asistente Virtual sobre procesos de matriculación académica

    Get PDF
    AitanaWEB es un chatbot para la asistencia telemática a usuarios sobre los procesos de matriculación académica y cuestiones relacionadas. Ofrece información sobre temas como los Horarios, Notas de Corte, Matrícula, Traslado de Expediente entre otros. Esta diseñado orientado a la accesibilidad y la usabilidad, posibilitando el acceso en Valenciano y Castellano. Permite el acceso desde diferentes navegadores, como Chrome, Firefox, Edge, Chrome para Android y Safari. Incorpora un narrador y facilidades de reconocimiento del habla permitiendo al usuario realizar la interacción con el sistema mediante voz. El proyecto cuenta de dos componentes básicos: (i) uso de DialogFlow de Google como servicio de Inteligencia Artificial donde se estructuran y entrenan las preguntas y respuestas. (ii) componente de desarrollo propio que hace de controlador, intérprete y enrutador entre DialogFlow y la interfaz de usuario final.Universidad de Alicante; Ministerio de Educación, Cultura y Deporte, Ministerio de Economía y Competitividad (MINECO) a través de los proyectos LIVING-LANG (RTI2018-094653-B-C22) e INTEGER (RTI2018-094649-B-I00); Gobierno de la Generalitat Valenciana a través del proyecto SIIA (PROMETEO/2018/089, PROMETEU/2018/089); se ha contado con el respaldo de las acciones COST: CA19134 - “Distributed Knowledge Graphs” y CA19142 - “Leading Platform for European Citizens, Industries, Academia and Policymakers in Media Accessibility”

    PCT Observer

    Get PDF
    PCT Observer es una aplicación web para visualizar y analizar datos relacionados con los parques científicos/tecnológicos. Permite descubrir la existencia de diferencias significativas o relaciones entre los indicadores clave, comparar su evolución en el tiempo, y determinar los indicadores más relevantes para caracterizar los diferentes tipos de parques.PCT Observer is a web application that allows to analyze and visualize key indicators of the scientific/technological parks. It facilitates the user the discovery of statistically significant differences or relationships between indicators, comparing their time series, and exploring the features that better characterize the different park types.Universidad de Alicante; Ministerio de Educación, Cultura y Deporte, Ministerio de Economía y Competitividad (MINECO) a través de los proyectos LIVING-LANG (RTI2018-094653-B-C22) e INTEGER (RTI2018-094649-B-I00); Gobierno de la Generalitat Valenciana a través del proyecto SIIA (PROMETEO/2018/089, PROMETEU/2018/089)

    Detección de regularidades en contornos 2D, cálculo aproximado de medianas y su aplicación en tareas de clasificación

    Get PDF
    El trabajo se enfoca en dos problemas principales: la identificación de similitudes y la obtención del promedio para contornos representados mediante cadenas de Freeman. Estos problemas son abordados empleando información obtenida en el cálculo de la distancia de Levenshtein entre las cadenas que codifican los contornos. En la tesis se presenta un nuevo método para cuantificar la regularidad entre los contornos y establecer una comparación en términos de un criterio de similitud. El criterio empleado permite encontrar subsecuencias de la cadena de operaciones de edición con coste mínimo que especifican un alineamiento entre pares de segmentos, uno en cada contorno, que se consideran semejantes de acuerdo a dos parámetros definidos externamente. La información sobre las partes similares en cada contorno queda codificada por una cadena que representa el "promedio" de los segmentos. A partir de la identificación de las similitudes, se define un procedimiento para construir un prototipo que representa a los contornos. Para evaluar que tan representativa es esta instancia se sustituyen grupos de contornos por su prototipo en el conjunto de entrenamiento de un clasificador K-NN. De este modo se logra reducir la talla del conjunto de entrenamiento sin afectar sensiblemente su poder de representación. Resultados experimentales muestran que este procedimiento es capaz de alcanzar reducciones del conjunto de entrenamiento cercanas al 80% mientras el error en la clasificación solo aumenta en un 0.45% en una de las bases de datos estudiadas. Por otra parte, se propone un nuevo algoritmo para el cálculo rápido de una aproximación a la media entre dos cadenas representando un contorno 2D, así como una implementación voraz que permite reducir significativamente el tiempo necesario para construir la aproximación al contorno medio. Éste se combina junto a un nuevo método de edición que suaviza las restricciones impuestas por el algoritmo de Wilson para desechar una instancia. En la práctica, no todas las instancias mal clasificadas por sus vecinos son eliminadas. En su lugar, una instancia artificial es añadida al conjunto esperando que de este modo la instancia se clasifique correctamente en el futuro. La instancia artificial es la media entre la instancia mal clasificada y su vecino más próximo de igual clase. Los experimentos desarrollados empleando tres conocidos conjuntos de contornos denotan que los algoritmos propuestos mejoran los resultados de otros métodos descritos en la literatura para el cálculo del promedio de dos contornos. El bajo coste computacional de la implementación voraz la hace muy adecuada para cadenas de Freeman de gran longitud. Resultados empíricos demuestran además que mediante el procedimiento de edición descrito es posible reducir el error en la clasificación en 83% de los casos, independientemente del algoritmo empleado para obtener el contorno promedio. Finalmente, se describe un algoritmo para construir una aproximación a la media de un conjunto de cadenas, la que se obtiene a través de sucesivas mejoras a una solución parcial. En cada iteración se calcula la distancia de la solución parcial a cada cadena del conjunto llevando cuenta de la frecuencia de las operaciones de edición en cada posición de la media aproximada. Esta información permite calcular un índice de calidad al multiplicar la frecuencia por el coste de la operación. Cada operación es evaluada, comenzando por aquella con mejor calidad, para valorar si su aplicación conduce a una mejora. En caso afirmativo, se comienza una nueva iteración a partir de la nueva solución. El algoritmo concluye después de examinar todas las operaciones sin que se logre una mejora. Los experimentos comparativos utilizando cadenas de Freeman muestran que se puede obtener aproximaciones equivalente a otros enfoques pero en menor tiempo

    Cascade of Biased Two-class Classifiers for Multi-class Sentiment Analysis

    Get PDF
    In this paper, we describe our participation in the Rest-Mex 2021 Sentiment Analysis Task. Our approach is based on an ensemble of BERT|BETO-based classifiers arranged in a cascade of binary models trained with a bias towards specific classes with the aim of lowering the Mean Average Error. The resulting models were judged in the 2nd and the 3rd place according to the evaluation rule of the Mean Absolute Error.This research work has been partially funded by the Generalitat Valenciana (Conselleria d'Educació, Investigació, Cultura i Esport) and the Spanish Government through the projects SIIA (PROMETEO/2018/089, PROMETEU/2018/089) and LIVING-LANG (RTI2018-094653-B-C22), and the Vice Chancellor for Research and Postgraduate Studies Office of the Universidad Católica de Temuco, VIPUCT Project No. 2020EM-PS-08; FEQUIP 2019-INRN-03 of the Universidad Católica de Temuc
    corecore